Skip to content

[clang-format] Add an option to format integer and float literal case #151590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

30Wedge
Copy link

@30Wedge 30Wedge commented Jul 31, 2025

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      UpperCasePrefix: Never
      UpperCaseHexDigit: Always
      UpperCaseSuffix: Never

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang-format labels Jul 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-clang

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer


Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/include/clang/Format/Format.h (+49)
  • (modified) clang/lib/Format/CMakeLists.txt (+1)
  • (modified) clang/lib/Format/Format.cpp (+19)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
  • (modified) clang/unittests/Format/CMakeLists.txt (+1)
  • (added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-clang-format

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer


Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/include/clang/Format/Format.h (+49)
  • (modified) clang/lib/Format/CMakeLists.txt (+1)
  • (modified) clang/lib/Format/Format.cpp (+19)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
  • (modified) clang/unittests/Format/CMakeLists.txt (+1)
  • (added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

Copy link
Contributor

@JustinStitt JustinStitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, most of them nits or typos. This isn't a code area I've reviewed much hence the minimal code comments.

Looks great though!

@JustinStitt
Copy link
Contributor

JustinStitt commented Jul 31, 2025

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit to do, but I like the proposed feature.

@HazardyKnusperkeks
Copy link
Contributor

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

Most likely the docs were not generated with the script, but manually, otherwise it wouldn't mismatch with version 21 vs 22.

@30Wedge
Copy link
Author

30Wedge commented Aug 2, 2025

Most likely the docs were not generated with the script, but manually

Yup! I manually edited ClangFormatStyleOptions.rst in the first commit. Much easier to have a script do it for you 🙂

In the fixup, this file is regenerated with: python clang/docs/tools/dump_format_style.py

Copy link
Contributor

@JustinStitt JustinStitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can't give any comprehensive code critiques but you addressed my nits and such so LGTM.

Copy link

github-actions bot commented Aug 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@owenca
Copy link
Contributor

owenca commented Aug 3, 2025

Please wait for @HazardyKnusperkeks and @mydeveloperday.

@owenca owenca requested a review from mydeveloperday August 3, 2025 08:38
Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not finished reviewing the changes, but must leave now. ;)

@owenca
Copy link
Contributor

owenca commented Aug 3, 2025

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:

NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

@HazardyKnusperkeks
Copy link
Contributor

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:

NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

That is way better.

@30Wedge
Copy link
Author

30Wedge commented Aug 4, 2025

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values.

I agree. So much more concise.

Copy link
Contributor

@owenca owenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to add a separate formatting pass for this new option as changing the case of letters in numeric literals has no impact on any existing passes. IMO, the best place to handle this is in FormatTokenLexer::getNextToken(). For example:

--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -1313,6 +1313,9 @@ FormatToken *FormatTokenLexer::getNextToken() {
     }
     WhitespaceLength += Text.size();
     readRawToken(*FormatTok);
+    if (FormatTok->Finalized || FormatTok->isNot(tok::numeric_constant))
+      continue;
+    // Handle Style.NumericLiteralCase here.
   }
 
   if (FormatTok->is(tok::unknown))

@30Wedge
Copy link
Author

30Wedge commented Aug 5, 2025

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong.
I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes.
Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

@owenca
Copy link
Contributor

owenca commented Aug 6, 2025

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong. I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes. Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

You are absolutely right! I was wrong about handling NumericLiteralCase in FormatTokenLexer and totally agree with you that it'd be better to do that in an existing pass.

@30Wedge
Copy link
Author

30Wedge commented Aug 7, 2025

Cool, thanks for hearing me out! I am working on handling NumericLiteralCase in the same pass as IntegerLiteralSeparatorFixer; that seemed natural since they both only modify single numeric_constant tokens. I won't have time to get to it until next week.

@HazardyKnusperkeks
Copy link
Contributor

Please mark discussions as resolved, if you made the asked change.

@owenca
Copy link
Contributor

owenca commented Aug 11, 2025

Cool, thanks for hearing me out! I am working on handling NumericLiteralCase in the same pass as IntegerLiteralSeparatorFixer; that seemed natural since they both only modify single numeric_constant tokens. I won't have time to get to it until next week.

I'd handle NumericLiteralCase in a non-optional pass instead.

@30Wedge
Copy link
Author

30Wedge commented Aug 12, 2025

I'd handle NumericLiteralCase in a non-optional pass instead.

I don't understand what this means. This seems like the only unconditional pass in clang::format::internal::reformat except for the Formatter pass. Could I have a specific example of a non-optional pass please?

@owenca
Copy link
Contributor

owenca commented Aug 13, 2025

I'd handle NumericLiteralCase in a non-optional pass instead.

I don't understand what this means. This seems like the only unconditional pass in clang::format::internal::reformat except for the Formatter pass. Could I have a specific example of a non-optional pass please?

The IntegerLiteralSeparatorFixer is an optional pass in the sense that the input is not tokenized if the option is not activated. That leaves the Formatter as the only non-optional pass. However, it doesn't seem natural to embed the NumericLiteralCaseFixer into the Formatter, so please ignore my request for not making a separate pass for the former.

@30Wedge 30Wedge force-pushed the format-integer-case branch from b562309 to 8b94bb6 Compare August 13, 2025 20:16
@30Wedge 30Wedge requested a review from owenca August 13, 2025 20:21
@owenca
Copy link
Contributor

owenca commented Aug 14, 2025

Please run git clang-format or build clang-format-check-format before git push.

@30Wedge 30Wedge requested a review from owenca August 15, 2025 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category clang-format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants